Basic Statistics

Raw Counts

Name Value
Rows 45,518
Columns 311
Discrete columns 17
Continuous columns 294
All missing columns 0
Missing observations 84,199
Complete Rows 767
Total observations 14,156,098
Memory allocation 107.9 Mb

Percentages

Data Structure

Missing Data Profile

Univariate Distribution

Histogram

Bar Chart (with frequency)

## 11 columns ignored with more than 50 categories.
## fullName: 588 categories
## city: 354 categories
## state: 58 categories
## home_arena_name: 356 categories
## home_city: 292 categories
## away_arena_name: 356 categories
## away_city: 292 categories
## arena_city: 354 categories
## arena_state: 58 categories
## game_date: 1332 categories
## week_end: 211 categories

QQ Plot

## Warning: Removed 1855 rows containing non-finite values (stat_qq).
## Warning: Removed 1855 rows containing non-finite values (stat_qq_line).

QQ Plot (by home_winner_response)

## Warning: Removed 1857 rows containing non-finite values (stat_qq).
## Warning: Removed 1857 rows containing non-finite values (stat_qq_line).

Correlation Analysis

## 15 features with more than 20 categories ignored!
## fullName: 114 categories
## city: 96 categories
## state: 40 categories
## home_arena_name: 74 categories
## home_city: 71 categories
## home_state: 36 categories
## away_arena_name: 76 categories
## away_city: 73 categories
## away_state: 39 categories
## arena_city: 96 categories
## arena_state: 40 categories
## game_date: 510 categories
## week_end: 168 categories
## home_ap_rank_fct: 25 categories
## away_ap_rank_fct: 25 categories
## Warning in cor(x = structure(list(game_id = c(313330194, 313360097, 313360183, : the standard deviation is zero

Principal Component Analysis

## 9 features with more than 50 categories ignored!
## fullName: 114 categories
## city: 96 categories
## home_arena_name: 74 categories
## home_city: 71 categories
## away_arena_name: 76 categories
## away_city: 73 categories
## arena_city: 96 categories
## game_date: 510 categories
## week_end: 168 categories
## Warning in (function (data, variance_cap = 0.8, maxcat = 50L, prcomp_args = list(scale. = TRUE), : The following features are dropped due to zero variance:
##  * type
##  * home_flagrant_fouls_roll5
##  * home_flagrant_fouls_opp_
##  * away_flagrant_fouls_
##  * away_flagrant_fouls_opp_roll5

Bivariate Distribution

Boxplot (by home_winner_response)

## Warning: Removed 84199 rows containing non-finite values (stat_boxplot).

Scatterplot (by home_winner_response)